For an annotated markdown of the analysis code, see: https://sbresnahan.github.io/allele-specific-transcription-and-m6A/reports/PSm6A.html

Prerequisite files

The “src” directory contains files that are required for this analysis:

Additionally, the “results” directory contains files output by previous scripts that are required for this analysis:

Results files

The “results” folder contains results from this analysis:

Additionally, all scripts and code for the full analysis are available at https://github.com/sbresnahan/allele-specific-transcription-and-m6A.

Study design

This study seeks to use transcriptome data to understand behavioral differences between non-aggressive and aggressive honey bees. Two crosses were made. Cross A had AHB as the female parent (SRR14654188) and EHB as the male parent (SRR14654189). Cross B had EHB as the female parent (SRR14654186) and AHB as the male parent (SRR14654187). Three or more aggressive bees and three or more non-aggressive bees (labeled C and T) were collected from each cross. Twelve head and abdomen samples, spanning multiple batches over the course of a year, were ultimately analyzed, with three samples per experimental group. RNA samples were sequenced with Oxford Nanopore Technology direct RNA sequencing on a FLO-MIN106 flowcell.

In this report I describe the detection of and relationship between allele-specific transcription and RNA m6A, as well as their relationship to gene expression and differential isoform usage.

Sample processing

FAST5 files from all samples were processed with Guppy 5.0.16 using the rna_r9.4.1_70bps_hac.cfg configuration (Oxford Nanopore Technologies) for base calling on a GPU. FASTQ files labeled “pass” were concatenated together to generate one FASTQ per sample. Minimap 2.211 was used for alignment to the Apis mellifera HAv3.1 reference genome (plus common viral genomes) in splice-aware mode with a kmer length of 14, using the published gene annotation to guide alignment. SAMtools 1.122 was used to sort and index the resulting BAM files. Flair 1.53 was then used to convert BAM files to BED format and correct intron-exon junctions based on the published annotation, then call isoforms across all samples. These isoforms included published transcripts and genes as well as novel transcripts and genes.

Finally, EpiNano4 was used to quantify read coverage and RNA m6A probability at parent SNPs within transcripts. combined with a custom R script (src/combine_results_2022-08-03.R) to subset the EpiNano output to the A sites at the center of RRACH motifs. Read coverage and m6A probability at each position within published and novel transcripts, separated by allele, were then combined to generate read count and m6A probability matrices.

Parent-of-origin allele-specific transcription

Transcripts with \(n < 2\) positions are filtered from the datasets. Storer-Kim binomial exact tests and generalized linear mixed effects models with interaction terms (GLIMMIX) are used to identify genes that exhibited parent-of-origin biased allelic expression following protocols for identifying significant variation in parent-of-origin gene expression described previously. 5 6 7

We identified a total of 31,078 unique transcripts in our samples, including 3,640 lincRNAs. Of these, 3,081 (9.91%) were shared between both crosses and sufficiently varied in their sequence composition between the parents of each cross, allowing for identification of parent-of-origin reads in the offspring. Specifically, we identified 35,782 SNP positions within transcripts that had at least 2 SNPs. Of the 3,081 transcripts, 2,584 were from the published annotation, and 497 were novel. After filtering SNP positions with low read counts in the offspring, our dataset contained read counts at 18,674 positions distributed among 1,928 transcripts in non-aggressive bees and 33,494 positions distributed among 2,820 transcripts in aggressive bees.

We identified 307 transcripts that were consistently biased in both crosses, including 228 from previously annotated genes, 66 novel transcripts, and 13 lincRNAs. Some transcripts showed biased allelic expression in both non-aggressive and aggressive bees (\(n = 11\)), whereas others were only biased in one group (non-aggressive only, \(n = 54\); aggressive only, \(n = 242\)). In support of our hypothesis, we found that paternally expressed genes were enriched in aggressive bees. Interestingly, maternally expressed genes were also enriched in aggressive bees, although there were fewer maternally expressed genes.

Parent-of-origin allele-specific RNA m6A

A Two-tailed, unpooled z-test of two proportions was conducted for each transcript for each SNP position to test for differences between maternal and paternal m6A probability among the samples. Z-test results were then FDR corrected and aggregated by transcript. For each transcript, all positions were required to exhibit the same direction of parent- or lineage-of-origin bias as described above.

We identified 86 transcripts that showed consistent parent or lineage biases in RNA m6A in both crosses, including 71 from previously annotated genes, 14 novel transcripts, and 1 lincRNA. Some transcripts showed biased RNA m6A in both non-aggressive and aggressive workers (\(n = 9\)), whereas others were only biased in one group (non-aggressive only, \(n = 32\); aggressive only, \(n = 45\)). In contrast to transcription, there was no relationship between parent-specific RNA m6A and offspring aggression.

Compare PSGE to PSm6A, and DElincRNAs & isoform switching

We find few genes that show both parent-of-origin allele-specific transcription and m6A, and few that show either that were differentially expressed and none that exhibited isoform switching.

Session info

sessionInfo()
## R version 4.1.3 (2022-03-10)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] doSNOW_1.0.20      snow_0.4-4         doParallel_1.0.17  iterators_1.0.14  
##  [5] foreach_1.5.2      kableExtra_1.3.4   gridExtra_2.3      viridis_0.6.2     
##  [9] viridisLite_0.4.1  car_3.1-1          carData_3.0-5      lmerTest_3.1-3    
## [13] lme4_1.1-31        Matrix_1.4-0       tryCatchLog_1.3.1  Rfast_2.0.6       
## [17] RcppZiggurat_0.1.6 Rcpp_1.0.9         plyr_1.8.8         forcats_0.5.2     
## [21] stringr_1.5.0      dplyr_1.0.10       purrr_1.0.0        readr_2.1.3       
## [25] tidyr_1.2.1        tibble_3.1.8       ggplot2_3.4.0      tidyverse_1.3.2   
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-155         fs_1.5.2             lubridate_1.9.0     
##  [4] webshot_0.5.4        httr_1.4.4           numDeriv_2016.8-1.1 
##  [7] tools_4.1.3          backports_1.4.1      bslib_0.4.2         
## [10] utf8_1.2.2           R6_2.5.1             DBI_1.1.3           
## [13] colorspace_2.0-3     withr_2.5.0          tidyselect_1.2.0    
## [16] compiler_4.1.3       cli_3.5.0            rvest_1.0.3         
## [19] formatR_1.13         xml2_1.3.3           sass_0.4.4          
## [22] scales_1.2.1         systemfonts_1.0.4    digest_0.6.31       
## [25] minqa_1.2.5          svglite_2.1.0        rmarkdown_2.19      
## [28] pkgconfig_2.0.3      htmltools_0.5.4      highr_0.10          
## [31] dbplyr_2.2.1         fastmap_1.1.0        rlang_1.0.6         
## [34] readxl_1.4.1         rstudioapi_0.14      jquerylib_0.1.4     
## [37] generics_0.1.3       jsonlite_1.8.4       googlesheets4_1.0.1 
## [40] magrittr_2.0.3       futile.logger_1.4.3  munsell_0.5.0       
## [43] fansi_1.0.3          abind_1.4-5          lifecycle_1.0.3     
## [46] stringi_1.7.8        yaml_2.3.6           MASS_7.3-55         
## [49] grid_4.1.3           crayon_1.5.2         lattice_0.20-45     
## [52] haven_2.5.1          splines_4.1.3        hms_1.1.2           
## [55] knitr_1.41           pillar_1.8.1         boot_1.3-28         
## [58] codetools_0.2-18     futile.options_1.0.1 reprex_2.0.2        
## [61] glue_1.6.2           evaluate_0.19        lambda.r_1.2.4      
## [64] modelr_0.1.10        vctrs_0.5.1          nloptr_2.0.3        
## [67] tzdb_0.3.0           cellranger_1.1.0     gtable_0.3.1        
## [70] assertthat_0.2.1     cachem_1.0.6         xfun_0.36           
## [73] broom_1.0.2          googledrive_2.0.0    gargle_1.2.1        
## [76] timechange_0.1.1     ellipsis_0.3.2

References


  1. Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34:3094-3100. doi:10.1093/bioinformatics/bty191↩︎

  2. Petr Danecek, James K Bonfield, Jennifer Liddle, John Marshall, Valeriu Ohan, Martin O Pollard, Andrew Whitwham, Thomas Keane, Shane A McCarthy, Robert M Davies, Heng Li (2021). Twelve years of SAMtools and BCFtools. GigaScience, Volume 10, Issue 2, giab008, https://doi.org/10.1093/gigascience/giab008↩︎

  3. Tang, A.D., Soulette, C.M., van Baren, M.J. et al. Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns. Nat Commun 11, 1438 (2020). https://doi.org/10.1038/s41467-020-15171-6↩︎

  4. Liu, H., Begik, O., Lucas, M.C. et al. Accurate detection of m6A RNA modifications in native RNA sequences. Nat Commun 10, 4079 (2019). https://doi.org/10.1038/s41467-019-11713-9↩︎

  5. Sarah D Kocher, Jennifer M Tsuruda, Joshua D Gibson, Christine M Emore, Miguel E Arechavaleta-Velasco, David C Queller, Joan E Strassmann, Christina M Grozinger, Michael R Gribskov, Phillip San Miguel, Rick Westerman, Greg J Hunt, A Search for Parent-of-Origin Effects on Honey Bee Gene Expression, G3 Genes|Genomes|Genetics, Volume 5, Issue 8, 1 August 2015, Pages 1657–1662, https://doi.org/10.1534/g3.115.017814↩︎

  6. Galbraith, D. A., Kocher, S. D., Glenn, T., Albert, I., Hunt, G. J., Strassmann, J. E., … & Grozinger, C. M. (2016). Testing the kinship theory of intragenomic conflict in honey bees (Apis mellifera). Proceedings of the National Academy of Sciences, 113(4), 1020-1025. https://doi.org/10.1073/pnas.1516636113↩︎

  7. Galbraith, D. A., Ma, R., & Grozinger, C. M. (2021). Tissue‐specific transcription patterns support the kinship theory of intragenomic conflict in honey bees (Apis mellifera). Molecular ecology, 30(4), 1029-1041. https://doi.org/10.1111/mec.15778↩︎